Chp 4 Linear Regression
Chp 5 Logistic regression

Hands-on Machine Learning with R
Boookclub R-Ladies Utrecht and R-Ladies Den Bosch

Martine Jansen

stRt

  • Organized by @RLadiesUtrecht and @RLadiesDenBosch
  • Meet-ups every 2 weeks on “Hands-On Machine Learning with R”
    by Bradley Boehmke and Brandon Greenwell
  • No session recording!But we will publish the slides and notes!
  • We use HackMD for making shared notes and for the registry:
    https://hackmd.io/rGu7xw2bRS-lm8lq7-wvXw
  • Please keep mic off during presentation. Nice to have camera on and participate to make the meeting more interactive.
  • Questions? Raise hand / write question in HackMD or in chat
  • Remember presenters are not necessarily also subject experts
  • Remember the R-Ladies code of conduct.
    In summary, please be nice to each other and help us make an inclusive meeting!

What did we talk about last time?

  • Target engineering: transform outcome variable via log/Box-cox
  • Missingness: informative/random, imputation via estimated statistic/KNN
  • Feature filtering: remove (near)-zero variance variables
  • Numeric feature engineering: sometimes useful to transformation to reduce skewness, standardization
  • Categorical feature engineering: lumping, one-hot / dummy encoding, label encoding
  • Dimension reduction: see chp 17-19
  • Proper implementation: sequential steps, data leakage, recipes

Part II Supervised Learning
Chp 4 Linear Regression

Approximate (linear) relationship between continuous response variable and set of predictor variables

4.1 Prerequisites

Libraries

library(dplyr)    # for data manipulation
library(ggplot2)  # for graphics

library(caret)    # for cross-validation, etc.
library(rsample)  # you have to scroll back in the book to detect
                  # necessary for initial_split
library(vip)      # variable importance
#library(pdp)     # is used in section on varible importance


Code for the data, from previous chps

ames <- AmesHousing::make_ames()

set.seed(123)
split <- initial_split(ames, prop = 0.7, 
                       strata = "Sale_Price")
ames_train  <- training(split)
ames_test   <- testing(split)

4.2 Simple linear regression

If Y and X are (approx) linearly related then: \(Y_i = \beta_0 + \beta_1X_i + \epsilon_i \text{, for } i = 1, ..., n, \text{ and } \epsilon_i \sim N(0,\sigma^2)\)

\(\beta_0\): intercept, average response when X = 0
\(\beta_1\): slope, increase in average response per 1 unit increase in X


Using least squares regression, coefficients can be calculated with lm:

model1 <- lm(Sale_Price ~ Gr_Liv_Area,
             data = ames_train)
model1$coef
(Intercept) Gr_Liv_Area 
 15938.1733    109.6675 
sigma(model1)
[1] 56787.94

Inference

  • Point estimates for \(\beta_0\), \(\beta_1\) and \(\sigma\) not that interesting
  • Need to know how much they vary
  • When these assumptions are met:
    • independent obs
    • random error mean zero, constant variance
    • random error normally distributed
      \(100(1-\alpha)\%\) confidence interval: \(\beta_j \pm t_{1-\alpha/2, n-p}\widehat{SE}_{\beta_j}\)
confint(model1, level = 0.95)
               2.5 %     97.5 %
(Intercept) 8384.213 23492.1336
Gr_Liv_Area  104.920   114.4149
summary(model1)

Call:
lm(formula = Sale_Price ~ Gr_Liv_Area, data = ames_train)

Residuals:
    Min      1Q  Median      3Q     Max 
-474682  -30794   -1678   23353  328183 

Coefficients:
             Estimate Std. Error t value Pr(>|t|)    
(Intercept) 15938.173   3851.853   4.138 3.65e-05 ***
Gr_Liv_Area   109.667      2.421  45.303  < 2e-16 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 56790 on 2047 degrees of freedom
Multiple R-squared:  0.5007,    Adjusted R-squared:  0.5004 
F-statistic:  2052 on 1 and 2047 DF,  p-value: < 2.2e-16

4.3 Multiple linear regression

  • More main effects:
    model2 <- lm(Sale_Price ~ Gr_Liv_Area + Year_Built, data = ames_train)
  • Add an interaction effect with : :
    model2a <- lm(Sale_Price ~ Gr_Liv_Area + Year_Built + Gr_Liv_Area:Year_Built, data = ames_train)
  • Add all the features in the set as main effects:
    model3 <- lm(Sale_Price ~ ., data = ames_train)
  • The analyst decides on having interaction effects in linear regression
  • When interaction effect in model, have also comprising terms in model

4.4 Assessing model accuracy

For this example, “best”model: lowest RMSE via cross-validation

set.seed(123)  # for reproducibility
(cv_model1 <- train(
  form = Sale_Price ~ Gr_Liv_Area, 
  data = ames_train, 
  method = "lm",
  trControl = trainControl(method = "cv", number = 10)
))
Linear Regression 

2049 samples
   1 predictor

No pre-processing
Resampling: Cross-Validated (10 fold) 
Summary of sample sizes: 1843, 1844, 1844, 1844, 1844, 1844, ... 
Resampling results:

  RMSE      Rsquared  MAE     
  56644.76  0.510273  38851.99

Tuning parameter 'intercept' was held constant at a value of TRUE

The (averaged) RMSE for the 3 main effect models:

cv_model1$results$RMSE
cv_model2$results$RMSE
cv_model3$results$RMSE
[1] 56644.76
[1] 46865.68
[1] 41691.74

Interpret the cv result as:
When applied to unseen data, the predictions model 3 makes are, on average, about 41691.74 off from the actual sale price.


Looking at adjusted \(R^2\), as I got taught:

summary(model1)$adj.r.squared
summary(model2)$adj.r.squared
summary(model3)$adj.r.squared
[1] 0.5004063
[1] 0.6567094
[1] 0.9339612

Model 3 explains 94% of the variance in sale price

4.5 Model concerns

Be sure the assumptions hold:

  • Linear relationship, if not transform
  • Constant variance among residuals (homoscedasticity)
  • No autocorrelation, errors are independent and uncorrelated
  • More observations than predictors, if not try regularized regression
  • No or little multicollinearity, if not difficult to separate out individual effects variables

4.6 Principal component regression

Address multicollinearity for instance by using Principal Components as predictors

set.seed(123)
cv_model_pcr <- train(
  Sale_Price ~ ., 
  data = ames_train, 
  method = "pcr",
  trControl = trainControl(method = "cv",
                           number = 10),
  preProcess = c("zv", "center", "scale"),
  tuneLength = 100)

bestTune <- cv_model_pcr$bestTune[1,1]

ggplot(cv_model_pcr) +
  geom_vline(xintercept = bestTune,
             color = "red")

Question I have: - Why useful? It brings RMSE down, but do we get insight in importance of predictors? Different from regular regression? - I thought there are max ncol(data) PC’s

Partial least squares

Supervised dimension reduction procedure:
- that finds new features - that not only captures most information in original features, - but also are related to the response - PLS places highest weight on variables most strongly related to response

set.seed(123)
cv_model_pls <- train(
  Sale_Price ~ ., 
  data = ames_train, 
  method = "pls",
  trControl = trainControl(method = "cv",
                           number = 10),
  preProcess = c("zv", "center", "scale"),
  tuneLength = 30
)

bestTune <- cv_model_pls$bestTune[1,1]

ggplot(cv_model_pls) +
  geom_vline(xintercept = bestTune,
             color = "red")

Added, how to see this best PLCmodel

the_best_pls <- cv_model_pls$finalModel
the_best_pls$coefficients
, , 1 comps

                                                          .outcome
MS_SubClassOne_Story_1945_and_Older                  -1148.4437622
MS_SubClassOne_Story_with_Finished_Attic_All_Ages     -147.5518230
MS_SubClassOne_and_Half_Story_Unfinished_All_Ages     -337.5525120
MS_SubClassOne_and_Half_Story_Finished_All_Ages       -865.0621444
MS_SubClassTwo_Story_1946_and_Newer                   1790.3091140
MS_SubClassTwo_Story_1945_and_Older                   -294.2140110
MS_SubClassTwo_and_Half_Story_All_Ages                 142.9321733
MS_SubClassSplit_or_Multilevel                        -116.6483406
MS_SubClassSplit_Foyer                                -233.9429173
MS_SubClassDuplex_All_Styles_and_Ages                 -499.7477346
MS_SubClassOne_Story_PUD_1946_and_Newer                474.0964976
MS_SubClassTwo_Story_PUD_1946_and_Newer               -529.4888895
MS_SubClassPUD_Multilevel_Split_Level_Foyer           -339.6653643
MS_SubClassTwo_Family_conversion_All_Styles_and_Ages  -463.2790399
MS_ZoningResidential_High_Density                     -251.1720053
MS_ZoningResidential_Low_Density                      1251.4545722
MS_ZoningResidential_Medium_Density                  -1435.1590405
MS_ZoningA_agr                                        -222.2313362
MS_ZoningC_all                                        -588.5308147
MS_ZoningI_all                                        -188.4592485
Lot_Frontage                                           946.5746179
Lot_Area                                              1268.8685184
StreetPave                                             406.0874043
AlleyNo_Alley_Access                                   541.3907149
AlleyPaved                                               7.2636545
Lot_ShapeSlightly_Irregular                           1286.1324281
Lot_ShapeModerately_Irregular                          604.3108370
Lot_ShapeIrregular                                      56.7309504
Land_ContourHLS                                        996.4744503
Land_ContourLow                                         40.4632900
Land_ContourLvl                                       -313.1923148
UtilitiesNoSeWa                                        -57.5003575
UtilitiesNoSewr                                       -177.2202607
Lot_ConfigCulDSac                                      738.5826335
Lot_ConfigFR2                                          -69.9826939
Lot_ConfigFR3                                          -61.4271755
Lot_ConfigInside                                      -313.3624865
Land_SlopeMod                                          260.4807670
Land_SlopeSev                                          117.4113000
NeighborhoodCollege_Creek                              385.2450000
NeighborhoodOld_Town                                 -1007.7948023
NeighborhoodEdwards                                   -745.5067238
NeighborhoodSomerset                                   759.7798061
NeighborhoodNorthridge_Heights                        2063.7545226
NeighborhoodGilbert                                    103.9408237
NeighborhoodSawyer                                    -593.6664520
NeighborhoodNorthwest_Ames                              88.4540426
NeighborhoodSawyer_West                                 35.1577458
NeighborhoodMitchell                                  -209.0198068
NeighborhoodBrookside                                 -660.1834000
NeighborhoodCrawford                                   324.2803697
NeighborhoodIowa_DOT_and_Rail_Road                    -930.5129973
NeighborhoodTimberland                                 661.5970764
NeighborhoodNorthridge                                1462.4573725
NeighborhoodStone_Brook                               1216.4405229
NeighborhoodSouth_and_West_of_Iowa_State_University   -348.4826044
NeighborhoodClear_Creek                                184.8274460
NeighborhoodMeadow_Village                            -522.6223815
NeighborhoodBriardale                                 -477.5235866
NeighborhoodBloomington_Heights                         77.0457274
NeighborhoodVeenker                                    267.6426497
NeighborhoodNorthpark_Villa                           -219.1978942
NeighborhoodBlueste                                   -116.7666952
NeighborhoodGreens                                      36.1280979
NeighborhoodGreen_Hills                                185.5883156
NeighborhoodLandmark                                   -58.1624595
Condition_1Feedr                                      -520.1024234
Condition_1Norm                                        501.4098425
Condition_1PosA                                        468.1369150
Condition_1PosN                                        297.6228330
Condition_1RRAe                                       -247.5469100
Condition_1RRAn                                        -31.9018760
Condition_1RRNe                                       -140.0522136
Condition_1RRNn                                        -16.9337473
Condition_2Feedr                                      -246.2331177
Condition_2Norm                                        -64.6223929
Condition_2PosA                                        701.2227337
Condition_2PosN                                        146.9542949
Condition_2RRAe                                         12.0203530
Condition_2RRNn                                       -157.6691047
Bldg_TypeTwoFmCon                                     -473.8892388
Bldg_TypeDuplex                                       -499.7477346
Bldg_TypeTwnhs                                        -472.2808302
Bldg_TypeTwnhsE                                        267.4836556
House_StyleOne_and_Half_Unf                           -375.2427958
House_StyleOne_Story                                  -227.7763357
House_StyleSFoyer                                     -371.7836497
House_StyleSLvl                                       -153.8870512
House_StyleTwo_and_Half_Fin                            108.2338188
House_StyleTwo_and_Half_Unf                             19.8667937
House_StyleTwo_Story                                  1086.4492939
Overall_QualPoor                                      -515.9361022
Overall_QualFair                                      -746.2640662
Overall_QualBelow_Average                            -1244.8539342
Overall_QualAverage                                  -1769.0582498
Overall_QualAbove_Average                             -663.0113467
Overall_QualGood                                       757.5514070
Overall_QualVery_Good                                 1981.3986060
Overall_QualExcellent                                 2169.6935109
Overall_QualVery_Excellent                            1675.3822962
Overall_CondPoor                                      -214.6983234
Overall_CondFair                                      -713.1128877
Overall_CondBelow_Average                             -741.1560633
Overall_CondAverage                                   1737.1924359
Overall_CondAbove_Average                             -827.8599673
Overall_CondGood                                      -652.7074623
Overall_CondVery_Good                                 -384.1605700
Overall_CondExcellent                                  203.1139955
Year_Built                                            2695.1687566
Year_Remod_Add                                        2590.1583129
Roof_StyleGable                                      -1221.2730706
Roof_StyleGambrel                                     -233.1829012
Roof_StyleHip                                         1332.4572710
Roof_StyleMansard                                      -33.8872793
Roof_StyleShed                                           4.7670700
Roof_MatlCompShg                                      -309.7073488
Roof_MatlMembran                                        80.2168595
Roof_MatlMetal                                          -1.2216871
Roof_MatlRoll                                          -58.1624595
`Roof_MatlTar&Grv`                                     -41.6040028
Roof_MatlWdShake                                       177.1746492
Roof_MatlWdShngl                                       624.1316792
Exterior_1stAsphShn                                   -168.0885830
Exterior_1stBrkComm                                    -88.5815376
Exterior_1stBrkFace                                    138.5922305
Exterior_1stCemntBd                                    688.1153806
Exterior_1stHdBoard                                   -499.2389760
Exterior_1stImStucc                                    107.3630416
Exterior_1stMetalSd                                   -613.2598674
Exterior_1stPlywood                                   -238.8880993
Exterior_1stStone                                      145.3152758
Exterior_1stStucco                                    -167.0741211
Exterior_1stVinylSd                                   1568.7952257
`Exterior_1stWd Sdng`                                 -900.2362128
Exterior_1stWdShing                                   -291.4766324
Exterior_2ndAsphShn                                   -141.8074904
`Exterior_2ndBrk Cmn`                                 -224.9339687
Exterior_2ndBrkFace                                     77.6712394
Exterior_2ndCBlock                                    -134.9662920
Exterior_2ndCmentBd                                    667.4204831
Exterior_2ndHdBoard                                   -449.7773259
Exterior_2ndImStucc                                    289.7950459
Exterior_2ndMetalSd                                   -555.8733663
Exterior_2ndPlywood                                   -359.2400775
Exterior_2ndStone                                     -115.2956981
Exterior_2ndStucco                                    -188.2596978
Exterior_2ndVinylSd                                   1558.7438005
`Exterior_2ndWd Sdng`                                 -813.6823158
`Exterior_2ndWd Shng`                                 -343.3368950
Mas_Vnr_TypeBrkFace                                   1324.0797599
Mas_Vnr_TypeCBlock                                    -133.6420880
Mas_Vnr_TypeNone                                     -2008.6375162
Mas_Vnr_TypeStone                                     1418.7258844
Mas_Vnr_Area                                          2506.6417188
Exter_QualFair                                        -675.6420742
Exter_QualGood                                        2211.6485539
Exter_QualTypical                                    -2827.8137420
Exter_CondFair                                        -678.8625683
Exter_CondGood                                        -275.8450857
Exter_CondPoor                                        -222.2313362
Exter_CondTypical                                      526.8548066
FoundationCBlock                                     -1708.9537892
FoundationPConc                                       2515.6398926
FoundationSlab                                        -579.5911461
FoundationStone                                       -161.4926451
FoundationWood                                         -23.9167970
Bsmt_QualFair                                         -757.1793721
Bsmt_QualGood                                         1086.4828924
Bsmt_QualNo_Basement                                  -741.6552906
Bsmt_QualPoor                                         -121.7242519
Bsmt_QualTypical                                     -2182.3400136
Bsmt_CondFair                                         -774.2048865
Bsmt_CondGood                                          436.9811493
Bsmt_CondNo_Basement                                  -741.6552906
Bsmt_CondPoor                                         -168.5258661
Bsmt_CondTypical                                       591.6054239
Bsmt_ExposureGd                                       1665.4995527
Bsmt_ExposureMn                                        120.8918187
Bsmt_ExposureNo                                      -1308.1346959
Bsmt_ExposureNo_Basement                              -719.9244922
BsmtFin_Type_1BLQ                                     -610.7652891
BsmtFin_Type_1GLQ                                     2131.0630668
BsmtFin_Type_1LwQ                                     -419.7614684
BsmtFin_Type_1No_Basement                             -741.6552906
BsmtFin_Type_1Rec                                     -780.1299281
BsmtFin_Type_1Unf                                     -463.8420503
BsmtFin_SF_1                                          -629.2909141
BsmtFin_Type_2BLQ                                     -137.5110203
BsmtFin_Type_2GLQ                                      184.3727280
BsmtFin_Type_2LwQ                                     -179.7471593
BsmtFin_Type_2No_Basement                             -741.6552906
BsmtFin_Type_2Rec                                     -223.7964055
BsmtFin_Type_2Unf                                      508.1943062
BsmtFin_SF_2                                            57.1029570
Bsmt_Unf_SF                                            894.3406865
Total_Bsmt_SF                                         2959.0305928
HeatingGasA                                            423.6047097
HeatingGasW                                            -95.7462524
HeatingGrav                                           -362.5116813
HeatingOthW                                            -78.0255196
HeatingWall                                           -337.0108942
Heating_QCFair                                        -687.9197855
Heating_QCGood                                        -641.7433800
Heating_QCPoor                                        -256.7533068
Heating_QCTypical                                    -1584.0900659
Central_AirY                                          1292.9713618
ElectricalFuseF                                       -649.2898911
ElectricalFuseP                                       -283.6421503
ElectricalMix                                         -150.8567401
ElectricalSBrkr                                       1186.8736015
ElectricalUnknown                                      -17.7742372
First_Flr_SF                                          2918.4037670
Second_Flr_SF                                         1420.8988741
Low_Qual_Fin_SF                                       -180.3170043
Gr_Liv_Area                                           3405.8746809
Bsmt_Full_Bath                                        1273.3848143
Bsmt_Half_Bath                                        -120.0428108
Full_Bath                                             2683.0864418
Half_Bath                                             1418.1869064
Bedroom_AbvGr                                          730.5469592
Kitchen_AbvGr                                         -608.4626679
Kitchen_QualFair                                      -794.0775241
Kitchen_QualGood                                      1493.7039581
Kitchen_QualPoor                                       -97.2264777
Kitchen_QualTypical                                  -2528.3486291
TotRms_AbvGrd                                         2406.5489508
FunctionalMaj2                                        -357.4390308
FunctionalMin1                                        -304.9278764
FunctionalMin2                                        -352.4450124
FunctionalMod                                          -85.8385470
FunctionalSal                                         -279.7994393
FunctionalSev                                         -159.5422694
FunctionalTyp                                          610.0041908
Fireplaces                                            2300.6763147
Fireplace_QuFair                                      -135.6926132
Fireplace_QuGood                                      1768.4144038
Fireplace_QuNo_Fireplace                             -2347.7719093
Fireplace_QuPoor                                      -325.3321701
Fireplace_QuTypical                                    774.7168622
Garage_TypeBasment                                    -208.5574909
Garage_TypeBuiltIn                                    1154.5776635
Garage_TypeCarPort                                    -344.2897666
Garage_TypeDetchd                                    -1758.8989288
Garage_TypeMore_Than_Two_Types                        -124.6895280
Garage_TypeNo_Garage                                 -1145.5657892
Garage_FinishNo_Garage                               -1145.5657892
Garage_FinishRFn                                       738.0433643
Garage_FinishUnf                                     -1989.9707336
Garage_Cars                                           3169.2163871
Garage_Area                                           3069.7262496
Garage_QualFair                                       -773.6486669
Garage_QualGood                                        251.7865262
Garage_QualNo_Garage                                 -1145.5657892
Garage_QualPoor                                       -238.6576009
Garage_QualTypical                                    1289.0879398
Garage_CondFair                                       -702.3630117
Garage_CondGood                                         25.5039873
Garage_CondNo_Garage                                 -1145.5657892
Garage_CondPoor                                       -397.4828690
Garage_CondTypical                                    1391.2522862
Paved_DrivePartial_Pavement                           -377.7102850
Paved_DrivePaved                                      1329.4806851
Wood_Deck_SF                                          1584.3250001
Open_Porch_SF                                         1422.4367197
Enclosed_Porch                                        -565.7707888
Three_season_porch                                     200.0853967
Screen_Porch                                           576.8210990
Pool_Area                                              150.1700090
Pool_QCFair                                              0.1025169
Pool_QCGood                                             94.2727281
Pool_QCNo_Pool                                        -285.0605942
Pool_QCTypical                                         -23.9167970
FenceGood_Wood                                        -473.5143028
FenceMinimum_Privacy                                  -810.5623631
FenceMinimum_Wood_Wire                                -209.9229815
FenceNo_Fence                                          942.0871082
Misc_FeatureGar2                                       -56.4460450
Misc_FeatureNone                                       275.5386810
Misc_FeatureOthr                                       -81.2845310
Misc_FeatureShed                                      -260.0896309
Misc_Val                                               -63.7221032
Mo_Sold                                                110.3830413
Year_Sold                                             -182.9407781
Sale_TypeCon                                           103.6155002
Sale_TypeConLD                                        -248.9987301
Sale_TypeConLI                                        -198.3928891
Sale_TypeConLw                                        -113.2881421
Sale_TypeCWD                                            61.0896242
Sale_TypeNew                                          1567.8837282
Sale_TypeOth                                          -195.7197040
Sale_TypeVWD                                           -58.1624595
`Sale_TypeWD `                                        -875.1734085
Sale_ConditionAdjLand                                 -250.5974073
Sale_ConditionAlloca                                  -114.7880266
Sale_ConditionFamily                                  -300.0729306
Sale_ConditionNormal                                  -479.6536897
Sale_ConditionPartial                                 1551.7712987
Longitude                                            -1205.3617909
Latitude                                              1343.9591591

, , 2 comps

                                                          .outcome
MS_SubClassOne_Story_1945_and_Older                   -937.5084623
MS_SubClassOne_Story_with_Finished_Attic_All_Ages      -30.0413629
MS_SubClassOne_and_Half_Story_Unfinished_All_Ages       -6.7195434
MS_SubClassOne_and_Half_Story_Finished_All_Ages         58.6687276
MS_SubClassTwo_Story_1946_and_Newer                   1391.5299269
MS_SubClassTwo_Story_1945_and_Older                    693.3851359
MS_SubClassTwo_and_Half_Story_All_Ages                 918.9398935
MS_SubClassSplit_or_Multilevel                        -444.5899808
MS_SubClassSplit_Foyer                                -351.7886676
MS_SubClassDuplex_All_Styles_and_Ages                 -527.0501714
MS_SubClassOne_Story_PUD_1946_and_Newer               -378.3505978
MS_SubClassTwo_Story_PUD_1946_and_Newer              -1443.7044608
MS_SubClassPUD_Multilevel_Split_Level_Foyer           -664.1650713
MS_SubClassTwo_Family_conversion_All_Styles_and_Ages  -134.4908053
MS_ZoningResidential_High_Density                     -211.2376436
MS_ZoningResidential_Low_Density                      1059.8576439
MS_ZoningResidential_Medium_Density                  -1027.3965469
MS_ZoningA_agr                                        -294.7912025
MS_ZoningC_all                                        -528.8065962
MS_ZoningI_all                                         -97.9054746
Lot_Frontage                                          1852.5310876
Lot_Area                                              2728.3632554
StreetPave                                             414.8440870
AlleyNo_Alley_Access                                   207.8607872
AlleyPaved                                            -145.1710598
Lot_ShapeSlightly_Irregular                           1452.0326063
Lot_ShapeModerately_Irregular                         1021.6388954
Lot_ShapeIrregular                                    -280.8350073
Land_ContourHLS                                       2262.0773609
Land_ContourLow                                        236.3595219
Land_ContourLvl                                      -1268.1056955
UtilitiesNoSeWa                                       -265.8186211
UtilitiesNoSewr                                       -169.3797225
Lot_ConfigCulDSac                                     1403.8898657
Lot_ConfigFR2                                         -610.5489920
Lot_ConfigFR3                                         -120.5294800
Lot_ConfigInside                                      -645.8473087
Land_SlopeMod                                          975.3409123
Land_SlopeSev                                          352.4435712
NeighborhoodCollege_Creek                            -1062.7950456
NeighborhoodOld_Town                                  -298.0101183
NeighborhoodEdwards                                   -982.2576861
NeighborhoodSomerset                                   396.5673477
NeighborhoodNorthridge_Heights                        3846.4606562
NeighborhoodGilbert                                  -1558.7332206
NeighborhoodSawyer                                    -838.7687834
NeighborhoodNorthwest_Ames                            -111.2326058
NeighborhoodSawyer_West                               -793.3424646
NeighborhoodMitchell                                  -307.2311125
NeighborhoodBrookside                                  -55.2722645
NeighborhoodCrawford                                  1570.4820874
NeighborhoodIowa_DOT_and_Rail_Road                    -476.7321490
NeighborhoodTimberland                                 869.5921893
NeighborhoodNorthridge                                2985.9130591
NeighborhoodStone_Brook                               2652.3684562
NeighborhoodSouth_and_West_of_Iowa_State_University   -154.7470892
NeighborhoodClear_Creek                                409.0027312
NeighborhoodMeadow_Village                            -869.9136958
NeighborhoodBriardale                                 -898.7877495
NeighborhoodBloomington_Heights                       -810.5661452
NeighborhoodVeenker                                    469.9263969
NeighborhoodNorthpark_Villa                           -464.4057180
NeighborhoodBlueste                                   -201.8133106
NeighborhoodGreens                                       2.6610907
NeighborhoodGreen_Hills                                820.7675871
NeighborhoodLandmark                                  -169.8362848
Condition_1Feedr                                      -597.6337836
Condition_1Norm                                        476.3349653
Condition_1PosA                                       1165.4848218
Condition_1PosN                                        406.6725871
Condition_1RRAe                                       -726.7875335
Condition_1RRAn                                       -338.2199032
Condition_1RRNe                                       -202.7952031
Condition_1RRNn                                        -52.4961554
Condition_2Feedr                                      -196.2821043
Condition_2Norm                                       -502.5998866
Condition_2PosA                                       1957.3735215
Condition_2PosN                                       -313.9108155
Condition_2RRAe                                        -17.5367637
Condition_2RRNn                                       -100.7039022
Bldg_TypeTwoFmCon                                     -123.1550133
Bldg_TypeDuplex                                       -527.0501714
Bldg_TypeTwnhs                                       -1220.9087185
Bldg_TypeTwnhsE                                       -768.4466594
House_StyleOne_and_Half_Unf                             23.5392788
House_StyleOne_Story                                  -586.5550195
House_StyleSFoyer                                     -635.8085977
House_StyleSLvl                                       -554.6574828
House_StyleTwo_and_Half_Fin                            667.6002950
House_StyleTwo_and_Half_Unf                            608.4692571
House_StyleTwo_Story                                   865.4226125
Overall_QualPoor                                      -472.2657827
Overall_QualFair                                      -689.1264158
Overall_QualBelow_Average                            -1390.2880791
Overall_QualAverage                                  -1897.6359094
Overall_QualAbove_Average                            -1605.7740426
Overall_QualGood                                      -680.0255529
Overall_QualVery_Good                                 2768.1116241
Overall_QualExcellent                                 4938.0657604
Overall_QualVery_Excellent                            4225.6346241
Overall_CondPoor                                       -57.1750202
Overall_CondFair                                      -839.9970346
Overall_CondBelow_Average                             -778.5136244
Overall_CondAverage                                    355.5292004
Overall_CondAbove_Average                             -433.0386811
Overall_CondGood                                       231.7050719
Overall_CondVery_Good                                  302.7393721
Overall_CondExcellent                                 1062.6540334
Year_Built                                            1375.0231660
Year_Remod_Add                                        2381.9350040
Roof_StyleGable                                      -2778.0064971
Roof_StyleGambrel                                       34.3271324
Roof_StyleHip                                         2889.1721513
Roof_StyleMansard                                      -18.2846344
Roof_StyleShed                                         -10.6303679
Roof_MatlCompShg                                      -681.3873815
Roof_MatlMembran                                       275.7901319
Roof_MatlMetal                                          -6.0661338
Roof_MatlRoll                                          -85.1099993
`Roof_MatlTar&Grv`                                    -118.2498750
Roof_MatlWdShake                                       194.9412918
Roof_MatlWdShngl                                      2060.7527449
Exterior_1stAsphShn                                     57.7754831
Exterior_1stBrkComm                                    263.9345826
Exterior_1stBrkFace                                   1216.8267784
Exterior_1stCemntBd                                   1382.6710325
Exterior_1stHdBoard                                  -1079.2155048
Exterior_1stImStucc                                    166.3499242
Exterior_1stMetalSd                                    207.5271413
Exterior_1stPlywood                                   -470.8971852
Exterior_1stStone                                      398.7101223
Exterior_1stStucco                                     184.7585178
Exterior_1stVinylSd                                    130.2638312
`Exterior_1stWd Sdng`                                 -141.6654164
Exterior_1stWdShing                                   -243.3292184
Exterior_2ndAsphShn                                      6.1340494
`Exterior_2ndBrk Cmn`                                 -352.4916266
Exterior_2ndBrkFace                                    593.0158318
Exterior_2ndCBlock                                     -92.9575598
Exterior_2ndCmentBd                                   1316.9006442
Exterior_2ndHdBoard                                   -956.5004725
Exterior_2ndImStucc                                    499.7481977
Exterior_2ndMetalSd                                    292.7391745
Exterior_2ndPlywood                                   -565.6376055
Exterior_2ndStone                                      131.1507162
Exterior_2ndStucco                                     198.9176678
Exterior_2ndVinylSd                                    136.3124285
`Exterior_2ndWd Sdng`                                  147.0411807
`Exterior_2ndWd Shng`                                 -199.6243533
Mas_Vnr_TypeBrkFace                                   1018.1582839
Mas_Vnr_TypeCBlock                                    -334.7483063
Mas_Vnr_TypeNone                                     -1899.8887751
Mas_Vnr_TypeStone                                     1812.8927133
Mas_Vnr_Area                                          4299.8060740
Exter_QualFair                                        -437.4726143
Exter_QualGood                                        1126.3266949
Exter_QualTypical                                    -2955.1780062
Exter_CondFair                                        -350.0871462
Exter_CondGood                                         405.7172221
Exter_CondPoor                                        -294.7912025
Exter_CondTypical                                     -345.7233089
FoundationCBlock                                     -1880.9751925
FoundationPConc                                       1907.0215926
FoundationSlab                                        -228.7681918
FoundationStone                                        174.6766618
FoundationWood                                        -184.2149991
Bsmt_QualFair                                         -496.3993270
Bsmt_QualGood                                        -1425.7384636
Bsmt_QualNo_Basement                                  -327.7323268
Bsmt_QualPoor                                          -41.4259320
Bsmt_QualTypical                                     -1698.5203396
Bsmt_CondFair                                         -370.6089641
Bsmt_CondGood                                          736.0839704
Bsmt_CondNo_Basement                                  -327.7323268
Bsmt_CondPoor                                           -4.6643532
Bsmt_CondTypical                                       -88.5177157
Bsmt_ExposureGd                                       3407.7938718
Bsmt_ExposureMn                                        126.6423970
Bsmt_ExposureNo                                      -2162.8614901
Bsmt_ExposureNo_Basement                              -362.8396160
BsmtFin_Type_1BLQ                                     -470.4603921
BsmtFin_Type_1GLQ                                     2530.1792221
BsmtFin_Type_1LwQ                                     -290.6337007
BsmtFin_Type_1No_Basement                             -327.7323268
BsmtFin_Type_1Rec                                     -479.2902880
BsmtFin_Type_1Unf                                    -1265.2791120
BsmtFin_SF_1                                         -1068.3788630
BsmtFin_Type_2BLQ                                     -177.6448014
BsmtFin_Type_2GLQ                                      635.8449397
BsmtFin_Type_2LwQ                                      -13.9926168
BsmtFin_Type_2No_Basement                             -327.7323268
BsmtFin_Type_2Rec                                     -146.6467279
BsmtFin_Type_2Unf                                      -96.6643119
BsmtFin_SF_2                                           642.7251508
Bsmt_Unf_SF                                            495.0318382
Total_Bsmt_SF                                         4628.4771101
HeatingGasA                                           -287.9268031
HeatingGasW                                            549.9063697
HeatingGrav                                            -77.5682996
HeatingOthW                                             82.5110995
HeatingWall                                           -170.1831344
Heating_QCFair                                        -383.7822776
Heating_QCGood                                        -788.7878590
Heating_QCPoor                                        -199.8365494
Heating_QCTypical                                    -1617.0170810
Central_AirY                                           608.1071135
ElectricalFuseF                                       -215.0829909
ElectricalFuseP                                        -41.5484701
ElectricalMix                                          -67.3748687
ElectricalSBrkr                                        473.0723367
ElectricalUnknown                                      -99.8737345
First_Flr_SF                                          5375.2363277
Second_Flr_SF                                         2522.9597633
Low_Qual_Fin_SF                                        339.6611460
Gr_Liv_Area                                           6251.8653190
Bsmt_Full_Bath                                        2354.0830990
Bsmt_Half_Bath                                        -195.9230466
Full_Bath                                             3438.8434722
Half_Bath                                             1501.9771437
Bedroom_AbvGr                                         1501.3034132
Kitchen_AbvGr                                         -470.4079083
Kitchen_QualFair                                      -396.4610015
Kitchen_QualGood                                       -64.2706379
Kitchen_QualPoor                                        -3.5249495
Kitchen_QualTypical                                  -2932.4444118
TotRms_AbvGrd                                         4483.8916289
FunctionalMaj2                                        -422.6333824
FunctionalMin1                                        -234.2191337
FunctionalMin2                                        -123.9664572
FunctionalMod                                          227.8848749
FunctionalSal                                         -399.2678154
FunctionalSev                                         -398.2479794
FunctionalTyp                                          405.2722231
Fireplaces                                            3954.5427922
Fireplace_QuFair                                      -339.9747727
Fireplace_QuGood                                      2875.3520265
Fireplace_QuNo_Fireplace                             -3471.8034609
Fireplace_QuPoor                                      -491.4246734
Fireplace_QuTypical                                    537.5658114
Garage_TypeBasment                                    -467.1285565
Garage_TypeBuiltIn                                    1334.8220224
Garage_TypeCarPort                                    -588.4703468
Garage_TypeDetchd                                    -1317.4474641
Garage_TypeMore_Than_Two_Types                        -124.6513666
Garage_TypeNo_Garage                                  -482.4973970
Garage_FinishNo_Garage                                -482.4973970
Garage_FinishRFn                                      -724.2405724
Garage_FinishUnf                                     -1565.3043964
Garage_Cars                                           3929.4357551
Garage_Area                                           4123.0878453
Garage_QualFair                                       -125.5564243
Garage_QualGood                                        734.4198324
Garage_QualNo_Garage                                  -482.4973970
Garage_QualPoor                                        -54.7368882
Garage_QualTypical                                     141.4588905
Garage_CondFair                                       -428.1829039
Garage_CondGood                                        261.9521529
Garage_CondNo_Garage                                  -482.4973970
Garage_CondPoor                                       -189.6901765
Garage_CondTypical                                     609.9526078
Paved_DrivePartial_Pavement                             18.5792425
Paved_DrivePaved                                       505.5649376
Wood_Deck_SF                                          2482.3265048
Open_Porch_SF                                         1776.7662370
Enclosed_Porch                                         461.1671179
Three_season_porch                                     461.0639096
Screen_Porch                                          1700.1969051
Pool_Area                                              210.3403281
Pool_QCFair                                            -16.2242372
Pool_QCGood                                           -274.3085284
Pool_QCNo_Pool                                        -662.6390024
Pool_QCTypical                                         108.9413801
FenceGood_Wood                                        -280.4913986
FenceMinimum_Privacy                                  -434.7606050
FenceMinimum_Wood_Wire                                -157.1656058
FenceNo_Fence                                          391.3298297
Misc_FeatureGar2                                        -0.3404585
Misc_FeatureNone                                       352.9078926
Misc_FeatureOthr                                        51.4330178
Misc_FeatureShed                                      -253.3382609
Misc_Val                                              -571.7104741
Mo_Sold                                                 18.7040377
Year_Sold                                             -313.8377364
Sale_TypeCon                                           276.0887625
Sale_TypeConLD                                        -112.8348366
Sale_TypeConLI                                        -351.0251116
Sale_TypeConLw                                        -240.9182437
Sale_TypeCWD                                           198.4752139
Sale_TypeNew                                          1570.8178157
Sale_TypeOth                                          -249.0030254
Sale_TypeVWD                                          -129.0249487
`Sale_TypeWD `                                        -673.1532978
Sale_ConditionAdjLand                                  -24.1377113
Sale_ConditionAlloca                                   125.8615142
Sale_ConditionFamily                                  -799.0015021
Sale_ConditionNormal                                  -113.5054866
Sale_ConditionPartial                                 1540.5550499
Longitude                                             -222.3334046
Latitude                                              1643.8770521

, , 3 comps

                                                         .outcome
MS_SubClassOne_Story_1945_and_Older                   -657.796964
MS_SubClassOne_Story_with_Finished_Attic_All_Ages       82.528781
MS_SubClassOne_and_Half_Story_Unfinished_All_Ages      491.173805
MS_SubClassOne_and_Half_Story_Finished_All_Ages        682.645158
MS_SubClassTwo_Story_1946_and_Newer                   1112.874597
MS_SubClassTwo_Story_1945_and_Older                    970.915842
MS_SubClassTwo_and_Half_Story_All_Ages                 920.410705
MS_SubClassSplit_or_Multilevel                        -715.740201
MS_SubClassSplit_Foyer                                  66.512272
MS_SubClassDuplex_All_Styles_and_Ages                -1495.060295
MS_SubClassOne_Story_PUD_1946_and_Newer               -958.436938
MS_SubClassTwo_Story_PUD_1946_and_Newer              -1050.978655
MS_SubClassPUD_Multilevel_Split_Level_Foyer           -478.567827
MS_SubClassTwo_Family_conversion_All_Styles_and_Ages  -798.353525
MS_ZoningResidential_High_Density                      -23.860477
MS_ZoningResidential_Low_Density                       501.113266
MS_ZoningResidential_Medium_Density                   -747.372334
MS_ZoningA_agr                                        -688.270405
MS_ZoningC_all                                       -1071.912608
MS_ZoningI_all                                        -317.415054
Lot_Frontage                                          1147.821858
Lot_Area                                              2391.046317
StreetPave                                             624.278243
AlleyNo_Alley_Access                                  -190.981496
AlleyPaved                                             293.543077
Lot_ShapeSlightly_Irregular                           1199.690086
Lot_ShapeModerately_Irregular                         1793.565109
Lot_ShapeIrregular                                   -1876.493191
Land_ContourHLS                                       2532.979091
Land_ContourLow                                       -402.456962
Land_ContourLvl                                        -73.592231
UtilitiesNoSeWa                                       -699.443994
UtilitiesNoSewr                                       -445.464859
Lot_ConfigCulDSac                                     2116.789179
Lot_ConfigFR2                                         -917.876926
Lot_ConfigFR3                                         -164.396989
Lot_ConfigInside                                      -491.763924
Land_SlopeMod                                          801.635304
Land_SlopeSev                                         -348.963023
NeighborhoodCollege_Creek                            -1258.801689
NeighborhoodOld_Town                                  -579.115132
NeighborhoodEdwards                                  -2070.874822
NeighborhoodSomerset                                  1211.351600
NeighborhoodNorthridge_Heights                        5324.777666
NeighborhoodGilbert                                  -2943.703372
NeighborhoodSawyer                                    -610.801946
NeighborhoodNorthwest_Ames                            -643.280950
NeighborhoodSawyer_West                              -1019.144965
NeighborhoodMitchell                                   -25.463573
NeighborhoodBrookside                                  610.155991
NeighborhoodCrawford                                  2179.463438
NeighborhoodIowa_DOT_and_Rail_Road                    -332.213969
NeighborhoodTimberland                                 729.944911
NeighborhoodNorthridge                                4622.757564
NeighborhoodStone_Brook                               3997.307912
NeighborhoodSouth_and_West_of_Iowa_State_University   -526.330084
NeighborhoodClear_Creek                               -226.914738
NeighborhoodMeadow_Village                            -879.819381
NeighborhoodBriardale                                 -407.597015
NeighborhoodBloomington_Heights                      -1509.275134
NeighborhoodVeenker                                    268.445939
NeighborhoodNorthpark_Villa                            242.220504
NeighborhoodBlueste                                     48.499637
NeighborhoodGreens                                      64.792598
NeighborhoodGreen_Hills                               2293.412259
NeighborhoodLandmark                                   -23.921148
Condition_1Feedr                                     -1487.119074
Condition_1Norm                                       1625.945801
Condition_1PosA                                       1338.376242
Condition_1PosN                                         36.586032
Condition_1RRAe                                       -909.653133
Condition_1RRAn                                       -475.533806
Condition_1RRNe                                       -203.700663
Condition_1RRNn                                       -287.961705
Condition_2Feedr                                      -309.678856
Condition_2Norm                                         -5.161076
Condition_2PosA                                       2653.239441
Condition_2PosN                                      -2391.394938
Condition_2RRAe                                       -208.166023
Condition_2RRNn                                        -86.675435
Bldg_TypeTwoFmCon                                     -713.169416
Bldg_TypeDuplex                                      -1495.060295
Bldg_TypeTwnhs                                        -804.086530
Bldg_TypeTwnhsE                                      -1217.535509
House_StyleOne_and_Half_Unf                            474.141536
House_StyleOne_Story                                  -880.808936
House_StyleSFoyer                                     -248.868701
House_StyleSLvl                                       -864.783668
House_StyleTwo_and_Half_Fin                            356.546580
House_StyleTwo_and_Half_Unf                            243.726159
House_StyleTwo_Story                                   830.022614
Overall_QualPoor                                      -644.607044
Overall_QualFair                                     -1314.271580
Overall_QualBelow_Average                            -1743.027325
Overall_QualAverage                                  -2078.702423
Overall_QualAbove_Average                            -2323.397222
Overall_QualGood                                     -1401.473524
Overall_QualVery_Good                                 3276.151475
Overall_QualExcellent                                 7733.684825
Overall_QualVery_Excellent                            5799.258458
Overall_CondPoor                                      -388.030850
Overall_CondFair                                     -1864.585240
Overall_CondBelow_Average                            -1654.276013
Overall_CondAverage                                   -651.388763
Overall_CondAbove_Average                              129.848537
Overall_CondGood                                      1281.681862
Overall_CondVery_Good                                 1252.375744
Overall_CondExcellent                                 1758.248550
Year_Built                                            1729.692272
Year_Remod_Add                                        3225.434315
Roof_StyleGable                                      -1686.407216
Roof_StyleGambrel                                     -214.879154
Roof_StyleHip                                         2125.533491
Roof_StyleMansard                                     -682.803331
Roof_StyleShed                                        -486.478325
Roof_MatlCompShg                                       311.016224
Roof_MatlMembran                                       303.226629
Roof_MatlMetal                                        -118.035740
Roof_MatlRoll                                         -293.364252
`Roof_MatlTar&Grv`                                    -898.567054
Roof_MatlWdShake                                      -357.032572
Roof_MatlWdShngl                                      3594.716287
Exterior_1stAsphShn                                     68.133082
Exterior_1stBrkComm                                    399.183204
Exterior_1stBrkFace                                   2172.416320
Exterior_1stCemntBd                                   1470.601559
Exterior_1stHdBoard                                  -1055.521249
Exterior_1stImStucc                                    188.914784
Exterior_1stMetalSd                                    768.372180
Exterior_1stPlywood                                   -616.407816
Exterior_1stStone                                      454.074801
Exterior_1stStucco                                    -251.649479
Exterior_1stVinylSd                                     79.746652
`Exterior_1stWd Sdng`                                 -652.058240
Exterior_1stWdShing                                   -517.579871
Exterior_2ndAsphShn                                   -129.251921
`Exterior_2ndBrk Cmn`                                  227.641748
Exterior_2ndBrkFace                                    711.738722
Exterior_2ndCBlock                                     -54.799204
Exterior_2ndCmentBd                                   1321.777865
Exterior_2ndHdBoard                                   -929.256483
Exterior_2ndImStucc                                    584.716346
Exterior_2ndMetalSd                                    893.459275
Exterior_2ndPlywood                                  -1044.292122
Exterior_2ndStone                                      -12.414018
Exterior_2ndStucco                                    -267.396783
Exterior_2ndVinylSd                                     82.782586
`Exterior_2ndWd Sdng`                                  142.092383
`Exterior_2ndWd Shng`                                 -280.643635
Mas_Vnr_TypeBrkFace                                    113.552546
Mas_Vnr_TypeCBlock                                    -777.728187
Mas_Vnr_TypeNone                                      -525.976818
Mas_Vnr_TypeStone                                     1140.766269
Mas_Vnr_Area                                          4733.015528
Exter_QualFair                                        -950.415264
Exter_QualGood                                         606.675198
Exter_QualTypical                                    -3149.629666
Exter_CondFair                                        -984.968559
Exter_CondGood                                         781.652610
Exter_CondPoor                                        -688.270405
Exter_CondTypical                                     -478.161088
FoundationCBlock                                     -2073.685795
FoundationPConc                                       2042.730770
FoundationSlab                                        -437.859627
FoundationStone                                        211.012761
FoundationWood                                        -190.152058
Bsmt_QualFair                                         -554.284997
Bsmt_QualGood                                        -2489.323388
Bsmt_QualNo_Basement                                  -557.532422
Bsmt_QualPoor                                          -12.616472
Bsmt_QualTypical                                     -2129.585212
Bsmt_CondFair                                         -888.329079
Bsmt_CondGood                                          846.004002
Bsmt_CondNo_Basement                                  -557.532422
Bsmt_CondPoor                                          -98.957799
Bsmt_CondTypical                                       269.184501
Bsmt_ExposureGd                                       3996.881645
Bsmt_ExposureMn                                       -405.151651
Bsmt_ExposureNo                                      -2265.857965
Bsmt_ExposureNo_Basement                              -654.049233
BsmtFin_Type_1BLQ                                     -433.220031
BsmtFin_Type_1GLQ                                     3567.341829
BsmtFin_Type_1LwQ                                     -719.687522
BsmtFin_Type_1No_Basement                             -557.532422
BsmtFin_Type_1Rec                                     -417.765345
BsmtFin_Type_1Unf                                    -2295.316721
BsmtFin_SF_1                                         -2050.556799
BsmtFin_Type_2BLQ                                     -341.883088
BsmtFin_Type_2GLQ                                     1159.182511
BsmtFin_Type_2LwQ                                      255.457374
BsmtFin_Type_2No_Basement                             -557.532422
BsmtFin_Type_2Rec                                     -556.430081
BsmtFin_Type_2Unf                                     -221.999953
BsmtFin_SF_2                                           859.201813
Bsmt_Unf_SF                                           -727.164210
Total_Bsmt_SF                                         4788.601393
HeatingGasA                                            -47.741657
HeatingGasW                                            523.270354
HeatingGrav                                           -244.693314
HeatingOthW                                            -52.497437
HeatingWall                                           -428.174229
Heating_QCFair                                        -886.098701
Heating_QCGood                                        -789.470919
Heating_QCPoor                                        -384.847003
Heating_QCTypical                                    -1832.892776
Central_AirY                                          1125.267022
ElectricalFuseF                                       -316.087274
ElectricalFuseP                                         84.449800
ElectricalMix                                          -81.098716
ElectricalSBrkr                                        337.001466
ElectricalUnknown                                       40.764980
First_Flr_SF                                          5483.803955
Second_Flr_SF                                         3235.173739
Low_Qual_Fin_SF                                        115.102795
Gr_Liv_Area                                           6909.265891
Bsmt_Full_Bath                                        3219.538150
Bsmt_Half_Bath                                        -685.011810
Full_Bath                                             3736.390529
Half_Bath                                             1941.854086
Bedroom_AbvGr                                          798.786696
Kitchen_AbvGr                                        -1813.360813
Kitchen_QualFair                                      -757.519086
Kitchen_QualGood                                     -1080.902122
Kitchen_QualPoor                                        63.062670
Kitchen_QualTypical                                  -3544.458402
TotRms_AbvGrd                                         4046.764882
FunctionalMaj2                                        -991.639412
FunctionalMin1                                        -905.763288
FunctionalMin2                                        -431.307927
FunctionalMod                                          -84.831386
FunctionalSal                                         -956.370313
FunctionalSev                                        -1175.971852
FunctionalTyp                                         1680.422662
Fireplaces                                            3381.636035
Fireplace_QuFair                                      -577.900829
Fireplace_QuGood                                      1606.286689
Fireplace_QuNo_Fireplace                             -2478.311050
Fireplace_QuPoor                                      -664.251904
Fireplace_QuTypical                                     66.988573
Garage_TypeBasment                                   -1213.373562
Garage_TypeBuiltIn                                    1299.094121
Garage_TypeCarPort                                    -989.977836
Garage_TypeDetchd                                     -855.314111
Garage_TypeMore_Than_Two_Types                        -515.080776
Garage_TypeNo_Garage                                    39.159237
Garage_FinishNo_Garage                                  39.159237
Garage_FinishRFn                                     -2013.330521
Garage_FinishUnf                                     -1053.070232
Garage_Cars                                           4215.932393
Garage_Area                                           4360.960895
Garage_QualFair                                         32.862865
Garage_QualGood                                        963.833180
Garage_QualNo_Garage                                    39.159237
Garage_QualPoor                                        -78.966970
Garage_QualTypical                                    -496.013722
Garage_CondFair                                       -878.639064
Garage_CondGood                                        236.502435
Garage_CondNo_Garage                                    39.159237
Garage_CondPoor                                       -431.919200
Garage_CondTypical                                     507.443021
Paved_DrivePartial_Pavement                            -57.705945
Paved_DrivePaved                                       648.389813
Wood_Deck_SF                                          2883.560319
Open_Porch_SF                                          845.549475
Enclosed_Porch                                         658.823481
Three_season_porch                                     748.822559
Screen_Porch                                          2373.448110
Pool_Area                                             -733.178528
Pool_QCFair                                           -121.922959
Pool_QCGood                                          -2150.199715
Pool_QCNo_Pool                                        -181.142866
Pool_QCTypical                                         -34.717025
FenceGood_Wood                                         209.763191
FenceMinimum_Privacy                                   202.637323
FenceMinimum_Wood_Wire                                  42.615237
FenceNo_Fence                                         -452.719485
Misc_FeatureGar2                                       199.128314
Misc_FeatureNone                                       557.781148
Misc_FeatureOthr                                       277.197775
Misc_FeatureShed                                       -91.339255
Misc_Val                                             -2399.813910
Mo_Sold                                               -512.204913
Year_Sold                                             -425.866531
Sale_TypeCon                                           711.772879
Sale_TypeConLD                                         -63.155014
Sale_TypeConLI                                        -500.500845
Sale_TypeConLw                                        -469.475064
Sale_TypeCWD                                           365.660799
Sale_TypeNew                                          1091.947209
Sale_TypeOth                                          -107.779942
Sale_TypeVWD                                          -321.043158
`Sale_TypeWD `                                         165.002983
Sale_ConditionAdjLand                                   70.855584
Sale_ConditionAlloca                                   216.022157
Sale_ConditionFamily                                 -1533.348043
Sale_ConditionNormal                                  1289.544776
Sale_ConditionPartial                                 1005.554932
Longitude                                             -344.696083
Latitude                                              2225.813451

4.8 Feature interpretation

  • Variable importance: identify variables most influential in model
  • LR: often absolute value t-statistic for each parameter
  • Difficult when having interactions and transformations
  • PLS: contribution coefficients weighted proportionally to reduction RSS

Calculate VIP in PLS
(100 is most important):

vip(cv_model_pls, 
    num_features = 20,
    method = "model")

PDP - partial dependence plots

  • Plot change in average predicted value as specified feature(s) vary over their marginal distribution
  • How fixed change in a predictor relates to fixed linear change in outcome, while taking into account average effect of all other features in model
  • More useful in case of non-linear relationships (chp 16)
# This is NOT a ggplot!
pdp::partial(cv_model_pls,
             "Gr_Liv_Area", 
             grid.resolution = 20, 
             plot = TRUE)

GIF inspiring zen

Part II Supervised Learning
Chp 5 Logistic Regression

Approximate the relationship between a binary response variable and a set of predictor variables

5.1 Prerequisites

Libraries

library(dplyr)    # for data manipulation
library(ggplot2)  # for graphics
library(caret)    # for cross-validation, etc.
library(rsample)  # necessary for initial_split
library(vip)      # variable importance
# library(modeldata)
# library(broom)
# library(ROCR)


Code for the data, from previous chps

# attrition <- rsample::attrition # line in book chp1  no longer works

# data are moved into the `modeldata` package
df <- modeldata::attrition %>%
  # make all factors unordered
  mutate_if(is.ordered, factor, ordered = FALSE)

set.seed(123)  # for reproducibility
churn_split <- initial_split(df, prop = .7, strata = "Attrition")
churn_train <- training(churn_split)
churn_test  <- testing(churn_split)

5.2 Why logistic regression

The formula of a sigmoid function looks complicated:

\[ p(X) = \frac {e^{\beta_0+\beta_1X}}{1+e^{\beta_0+\beta_1X}} \]

Look at odds:

\[ \frac {p(X)} {1-p(X)} = \frac {e^{\beta_0+\beta_1X}}{1+e^{\beta_0+\beta_1X}} / \frac {1}{1+e^{\beta_0+\beta_1X}} = e^{\beta_0+\beta_1X} \]

And then take log, and call that logit (the log of the odds):

\[ log \left( \frac {p(X)} {1-p(X)}\right) = log \left(e^{\beta_0+\beta_1X} \right) = \beta_0+\beta_1X \]

5.3 Simple logistic regression

Models are calculated using Maximum Likelihood

model1 <- glm(Attrition ~ MonthlyIncome,
              family = "binomial", 
              data = churn_train)
broom::tidy(model1)[,1:2] %>%
  knitr::kable(booktabs = TRUE)
term estimate
(Intercept) -0.8860896
MonthlyIncome -0.0001386

Increase of 1 unit in MonthlyIncome,

  • logit of attrition 0.000139 less
  • odds of attrition multiply by exp(-0.000139) = 0.99986
  • hence odds smaller, hence probability smaller

Confidence interval for coefficient

tidy(model1)
# A tibble: 2 x 5
  term           estimate std.error statistic      p.value
  <chr>             <dbl>     <dbl>     <dbl>        <dbl>
1 (Intercept)   -0.886    0.157         -5.64 0.0000000174
2 MonthlyIncome -0.000139 0.0000272     -5.10 0.000000344 


# for the logit coefficients:
confint(model1)
                      2.5 %        97.5 %
(Intercept)   -1.1932606571 -5.761048e-01
MonthlyIncome -0.0001948723 -8.803311e-05


# for the odds coefficients:
exp(confint(model1))
                  2.5 %    97.5 %
(Intercept)   0.3032309 0.5620835
MonthlyIncome 0.9998051 0.9999120

5.4 Multiple logistic regression

Explaining attrition from MonthlyIncome and Overtime:

model3 <- glm(
  Attrition ~ MonthlyIncome + OverTime,
  family = "binomial", 
  data = churn_train
  )

broom::tidy(model3)
# A tibble: 3 x 5
  term           estimate std.error statistic  p.value
  <chr>             <dbl>     <dbl>     <dbl>    <dbl>
1 (Intercept)   -1.33     0.177         -7.54 4.74e-14
2 MonthlyIncome -0.000147 0.0000280     -5.27 1.38e- 7
3 OverTimeYes    1.35     0.180          7.50 6.59e-14
churn_train3 <- # different from book:
  # adds column "pred" to data
  # with probs according to model 3
  modelr::add_predictions(churn_train, model = model3, type = "response") %>%
  mutate(prob = ifelse(Attrition == "Yes", 1, 0))

# also different from book
ggplot(churn_train3, 
       aes(x = MonthlyIncome, color = OverTime)) +
  geom_point(aes(y = prob), alpha = .15) +       # observations
  geom_point(aes(y = pred)) +                    # predictions
  labs(title = "Predicted probabilities for model3",
       x = "Monthly Income",
       y = "Probability of Attrition")

5.5 Assessing model accuracy - how well models predict

Attrition ~ MonthlyIncome

set.seed(123)
cv_model1 <- train(
  Attrition ~ MonthlyIncome, 
  data = churn_train, 
  method = "glm",
  family = "binomial",
  trControl = trainControl(method = "cv",
                           number = 10))

pred_class1 <- predict(cv_model1, 
                       churn_train)

confusionMatrix(
  data = relevel(pred_class1,
                 ref = "Yes"), 
  reference = 
    relevel(churn_train$Attrition,
            ref = "Yes")
) 

Attrition ~ .

set.seed(123)
cv_model3 <- train(
  Attrition ~ ., 
  data = churn_train, 
  method = "glm",
  family = "binomial",
  trControl = trainControl(method = "cv",
                           number = 10))

pred_class3 <- predict(cv_model3, 
                       churn_train)

confusionMatrix(
  data = relevel(pred_class3,
                 ref = "Yes"), 
  reference = 
    relevel(churn_train$Attrition,
                      ref = "Yes")
) 

Attrition ~ MonthlyIncome

Confusion Matrix and Statistics

          Reference
Prediction Yes  No
       Yes   0   0
       No  165 863
                                          
               Accuracy : 0.8395          
                 95% CI : (0.8156, 0.8614)
    No Information Rate : 0.8395          
    P-Value [Acc > NIR] : 0.5208          
                                          
                  Kappa : 0               
                                          
 Mcnemar's Test P-Value : <2e-16          
                                          
            Sensitivity : 0.0000          
            Specificity : 1.0000          
         Pos Pred Value :    NaN          
         Neg Pred Value : 0.8395          
             Prevalence : 0.1605          
         Detection Rate : 0.0000          
   Detection Prevalence : 0.0000          
      Balanced Accuracy : 0.5000          
                                          
       'Positive' Class : Yes             
                                          

Attrition ~ .

Confusion Matrix and Statistics

          Reference
Prediction Yes  No
       Yes  83  20
       No   82 843
                                          
               Accuracy : 0.9008          
                 95% CI : (0.8809, 0.9184)
    No Information Rate : 0.8395          
    P-Value [Acc > NIR] : 8.982e-09       
                                          
                  Kappa : 0.5658          
                                          
 Mcnemar's Test P-Value : 1.542e-09       
                                          
            Sensitivity : 0.50303         
            Specificity : 0.97683         
         Pos Pred Value : 0.80583         
         Neg Pred Value : 0.91135         
             Prevalence : 0.16051         
         Detection Rate : 0.08074         
   Detection Prevalence : 0.10019         
      Balanced Accuracy : 0.73993         
                                          
       'Positive' Class : Yes             
                                          

No Information Rate : 0.8395: Predict most common outcome (“No”) for all, still accuracy 83.9%.
Accuracy: P(pred = actual), (TP+TN)/(TP+FP+TN+FN)
Sensitivity (recall): P(pred = “yes”| actual = “yes”), TP / (TP + FN)
Specificity: P(pred = “no”| actual = “no”), TN / (TN + FP)
Pos Pred Value (precision): P(actual = “yes”| pred = “yes”), TP / (TP + FP)
Neg Pred Value: P(actual = “no”| pred = “no”), TN / (TN + FN)
Prevalence: (TP+FN)/(TP+FN+FP+FN)

ROC curve

library(ROCR)

m1_prob <- predict(cv_model1, 
       churn_train, type = "prob")$Yes
m3_prob <- predict(cv_model3, 
       churn_train, type = "prob")$Yes

# Compute AUC metrics for models
perf1 <- prediction(m1_prob, 
                    churn_train$Attrition) %>%
  performance(measure = "tpr", 
              x.measure = "fpr")
perf2 <- prediction(m3_prob, 
                    churn_train$Attrition) %>%
  performance(measure = "tpr", 
              x.measure = "fpr")

plot(perf1, col = "black", lty = 2)
plot(perf2, add = TRUE, col = "blue")
legend(0.8, 0.2, legend = c("cv_model1", "cv_model3"),
       col = c("black", "blue"), lty = 2:1, cex = 0.6)

5.6 Model concerns

  • Also important to check adequacy
  • Concept of residual is difficult
  • Some literature referals

5.7 Feature interpretation

vip(cv_model3, num_features = 20)
  • Logistic regression assumes a monotonic linear relationship on logit scale
  • On the probability scale, the relationship will be nonlinear, see PDP’s.

5.8 Final thoughts

  • Logistic regression suffers also from the many assumptions (i.e. linear relationship of the coefficient, multicollinearity)
  • Often more than two classes to predict (multinomial classification)
  • Future chapters discuss more advanced algorithms for binary and multinomial classification